94
Binary Neural Architecture Search
The value of selecting an operation ˜rk is the expected reward rewardi we receive when
we take an operation from the possible set of operations. If nk approaches infinity, ˜rk
approaches the actual value of the operation rk. However, the number of operations nk
cannot be infinite. Therefore, we should approximate the actual value as closely as possible
through the variance.
Definition 2. There exists a difference between the estimated probability ˜rk and the actual
probability rk, and we can estimate the variance concerning the value
˜δk =
2 ln N
n
,
(4.3)
where N is the total number of trails.
Proof. Suppose X ∈[0, 1] represents the theoretical value of each independently distributed
operation. n is the number of times the arm has been played up to trial, and pi is the actual
value of the operation in the ith trail. Furthermore, we define p =
i pi
n
and q = 1 −p.
Since the variance boundary of independent operations can represent the global variance
boundary (see the Appendix), based on Markov’s inequality, we can arrive at below :
P[X > p + δ] = P[
i
(Xi −pi) > δ]
= P[eλ
i(Xi−pi) > eλδ]
≤E[eλ
i(Xi−pi)]
eλδ
.
(4.4)
Since we can get 1 + x ≤ex ≤1 + x + x2 when 0 ≤|x| ≤1), E[eλ
i(Xi−pi)] in Eq. 4.4
can be further approximated as follows:
E[eλ
i(Xi−pi)] =
!
i
E[eλ(Xi−pi)]
≤
!
i
E[1 + λ(Xi −pi) + λ2(Xi −pi)2]
=
!
i
(1 + λ2v2
i )
≤eλ2v2,
(4.5)
where v denotes the variance of X. Combining Eq. 4.4 and Eq. 4.5 gives P[X > p +
δ] ≤eλ2v2
eλδ . Since λ is a positive constant, it can be obtained by the transformation of the
values P[X > p + δ] ≤e−2nδ2. According to the symmetry of the distribution, we have
P[X < p −δ] ≤e−2nδ2. Finally, we get the following inequality:
P[|X −p| ≤δ] ≥1 −2e−2nδ2.
(4.6)
We need to decrease δ as operating recommendations increase. Therefore, we choose
2 ln N
n
as ˜δ. That is to say, p −
2 ln N
n
≤X ≤p +
2 ln N
n
is implemented at least with
probability 1−
2
N 4 . The variance value will gradually decrease as the trail progresses, and ˜rk
will gradually approach rk. Equation 4.7 shows that we can achieve a probability of 0.992
when the number of the trail gets 4.
1 −2
N 4 =
⎧
⎪
⎨
⎪
⎩
0.857
N=2
0.975
N=3
0.992
N=4.
(4.7)